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ABSTRACT 

Pattern  recognition  schemes  have  been  concerned,  mainly, 
with  the  problem  of  identifying  alphabetic  characters,  or 
numerals.  To  this  end  adaptive  pattern  recognition  devices  have 
been  trained  to  recognize  such  characters.  The  topic  discussed 
here  is  the  training  of  an  adaptive  pattern  recognition  device, 
Adaline,  to  mimic  the  performance  of  the  controller  of  a  plant. 

This  study  discusses  Adaline  and  the  minimum  square  error 
method  of  adaption,   Adaline  is  trained  by  observing  the  behaviour 
of  the  controller  in  numerous  situations o   ^he  problem  of 
coding  the  information  to  be  presented  to  Adaline  is  discussed* 
and  finally,  a  suitably  trained  Adaline  takes  control  of  the 
plant , 
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INTRODUCTION 

The  problem  of  automatic  character  recognition  has  received 
a  great  deal  of  attention  recently o   Many  schemes  have  been 
proposed  and  they  can  be  divided  into  two  main  groups,  "open 
loop"  types  and  "closed  loop"  types.   Open  loop  schemes  compare 
the  character  to  be  recognized  with  a  library  stock  and  make 
a  decision  on  this  basis .  These  schemes  are  very  useful  when 
the  characters  are,  essentially,  standard — possibly  typewritten. 
Closed  loop  schemes  are  trained  to  recognize  characters  by 
applying  as  many  patterns  as  possible  to  the  machine  while 
••teaching"  it  to  generate  the  correct  answers  before  asking  it 
to  operate  alone.  These  schemes  have  been  described  by  such 
names  as  "linear  decision  network"  or  "adaptive  linear  neuron" 
(Adaline).   Their  advantage  over  closed  loop  schemes  is  that 
they  are  able  to  attempt  the  classification  of  patterns  other 
than  those  on  whiGh  the  machine  has  been  trained. 

The  latter  scheme  has  application  to  process  control. 
The  human  operator  of  a  steel  rolling  mill  is  presented  with 
such  data  as  speed  and  temperature  of  the  incoming  billet. 
On  the  basis  of  this  knowledge  and  past  experience  he  can 
adjust  the  rolls  to  produce  the  desired  sheet  steel.   A  pattern 
recognition  device  could  be  placed  beside  him  and  be  trained 
to  recognize  suitably  coded  patterns  of  information  about 
the  billet  and  thus  to  produce  the  same  "response"  as 
the  operator  in  control  of  the  rolls. 


The  present  Investigation  considers  a  simple  relay  activated 
plant,  which  is  already  controllable,  and  considers  the  training 
of  a  pattern  recognition  device  to  produce  the  same  result  as 
the  relay  controller.   The  blocK  diagram  of  a  stable  plant 
is  shown  in  Figure  (a),  where  q(^")  is  the  control  signal  which 
causes  the  relay  to  apply  driving  power  to  the  plant.   If  the 
linear  control  system  error  is  defined  to  be  the  difference  between 
desired  and  actual  output  at  any  given  time,  then  a  convenient  way 
of  describing  the  behaviour  of  a  plant  is  to  consider  the 
system  error  and  its  derivatives  as  functions  of  time0 
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Figure  (a) 
The  pattern  recognizer  is,  therefore,  presented  with  suitably 
coded  patterns  of  error,  error  rate  (and  higher  derivatives 
if  necessary)  and  it  is  then  trained  to  produce  an  output 
signal,  CI  [fc),    close  to  d(,A)  .   This  is  indicated  in  Figure  (b) 
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Figure    (b) 
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Training  involves  presenting  the  pattern  recognizer  with  as  many 
typical  operating  conditions  as  possible,  while  adjusting  it 
so  that  0L(fc)  matches  Cl(fc)as  closely  as  possible  over  the 
entire  range  of  conditions.   After  the  training  period  the 
compensator  can  be  disconnected  as  the  learning  machine  takes 
its  place.   This  is  shown  in  Figure  (c). 
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Figure  (c) 
The  present  study  discusses  the  properties  of  the  pattern 
recognition  device  (in  Chapter  I)  and  a  specific  method  of 
adaption  (in  Chapter  II.)  The  training  of  the  device,  the  coding 
of  error  and  error  rate  and  the  control  of  the  plant  by  the 
learning  machine  are  discussed  in  Chapter  III.   The  conclusions 
appear  in  Chapter  IV. 


CHAPTER  I 
PATTERN  RECOGNITION 


An  adaptive  pattern  recognition  device  of  the  type  to 
be  considered  has  four  essential  components:   a  sensory  unit, 
an  association  unit,  a  response  unit  and  an  adjustment  unit. 
The  whole  device  is  shown  in  Figure  1.   It  is  expected  of 
such  a  device  that  it  can  be  trained  to  recognize  stimuli  or 
patterns  which  are  part  of  the  environment  in  which  it  is  placed. 


erit. 


Sensory 
Limt 


/ 
I 

I 


_   hQjUS 


Adiustvnewt 


Association 
Unit 


Figure  1 
The  sensory  unit  is  a  transducer  which  produces, 
possibly,  a  set  of  electrical  signals  in  response  to  a  visual 
or  audio  pattern.   If  the  patterns  which  are  to  be  recognized 
are  alphabetical  or  numerical,  then  they  could  be  displayed 
on  a  matrix  of  photo-cells  which  could,  in  turn,  generate 
positive  or  negative  voltages,  depending  on  the  presence 
or  absence  of  an  element  of  the  pattern.   This  is  shown  in 
Figure  2. 


Throughout  the  remainder  of  the  discussion  the  characteristics 
of  the  transducer  will  be  bypassed  and  the  pattern,  or  Input, 
for  the  association  unit  will  be  considered  to  be  simply 
an  array  of  positive  and  negative  voltages  of  unit  magnitude • 

The  association  unit  Is  a  logical  decision  element 
which  produces  an  output  on  receipt  of  the  input  pattern o 
It  will  be  assumed  to  consist  of  a  set  of  adjustable  weights 
of  number  T\+  \     when  the  number  of  elements  of  the  Input  pattern 
is  tl  .  The  n  elements  of  the  input  pattern  are  supplied 

to  the  weights  W,  ,W2,W^, \Whc   The  weight,  W0  ,  is  called 

the  threshold  and  its  input  is  fixed  at  +lo   The  values  of 
the  weights  and  threshold  are  determined  by  previous  training*. 
In  this  study  all  the  weights  and  threshold  will  have  any 
past  experience  removed  by  setting  them  to  zero  before 
a  training  sequence  begins.   If  the  elements  of  a  pattern 
are  supplied  to  the  device,  the  sum  of  the  outputs  of  the 
weights  and  the  threshold  is  called  the  analogue  output, /X  0 
This  is  shown  In  Figure  3, 


X;*  o 


„ — i 


'_        AeLjastrrien-t's   -to  -Me  tVe/^-ts. 

Figure   3 


The  function  of  the  response  unit  is  to  indicate  which 
decision  has  been  made  on  the  pattern  applied  to  the  sensor. 
In  this  discussion  the  characteristics  of  the  response  unit 
will  be  bypassed  and  the  analogue  output,  {J,  ,  will  be  used 
as  an  indication  of  the  pattern  which  has  been  applied „ 

These  ideas  have  been  discussed  by  many  authors, £  1 , 2,^ 
3>4,5]o   Widrow  has  named  the  device  "Adaline"  (Adaptive 
linear  neuron). 
1*1  The  Adaptive  Mechanism 

The  problem  of  pattern  recognition  using  Adaline 
could  be  stated  in  the  following  way:-  given  \T\   input 
patterns,  each  having  H  pattern  elements,  separate  them 
into  two  classes — some  of  the  patterns  being  mapped  to 
positive  values  of  analogue  output  and  the  remainder 
to  negative  values  of  analogue  output. 

Each  input  pattern  can  be  thought  of  as  a  column  vector, 
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with  Xm  ■  t  1,  andfxj.  the  1th  pattern  of  m  .   If  the  A  th 
pattern,  Txl.  »  is  applied  to  the  device,  the  resulting  analogue 
output,  ^,  is  %  -  V^+V^X^ +VJjX;j+--  +  V)nXin+\^ 


4  =  1 


where  W0  and   VJ,  ,  Vv^  , \A/j  , W^  are   the   threshold  and 


set  of  weights  respectively*,  To  be  a  competent  pattern  recognizer 
Adallne  must  be  trained,  In  some  way,  to  make  .(X^    match  -d-^  , 
where  Qui   *s  tne  required  or  desired  output  for  the  A  th 
pattern.   After  training,  which  entails  adaption  of  the  weights, 
It  Is  hoped  that  when  presented  with  a  pattern,  X  .  »  Adallne 
will  make  the  correct  classification  by  generating  an  output 
as  nearly  as  possible  equal  to  CL^  , 

If  (d^~"Cl;Lj  is  defined  as  the  analogue  error,  £^  , 
any  training  scheme  which  will  make  all  the  S^  small  is  worth 
considering.   The  adaption  scheme  studied  here  attempts  to 
minimize  the  sum  of  the  squares  of  the  errors  for  all  the 
patterns  presented,   i.e.  the  scheme  minimizes^. €4.  .  This 
topic  is  discussed  fully  in  Chapter  II .   Other  adaption  schemes 
are  discussed  by  Treado  £.^0  •   For  each  pattern,  |_)Cj.  ,  the 
analogue  error,  &j_    ,  is  measured e      An  equal  adjustment  is 
then  made  to  each  weight .   This  adjustment  is  proportional 
to  the  error,  Gji   ,  and  Is  of  such  a  sign  as  will  reduce 
C^  and  2.^  .   Thus  the  change  to  the  weight,  Wj  ,  at  each  step 
is  given  by:-       AVJu  =  3  e^X^  1-1 

and  for  the  threshold,  V\]Q  ,  it  is:-   &V/^,  s  j}  £* 
where  Q  is  a  constant  and  where  it  will  be  assumed  that  the 
weights  can  be  varied  continuously.  The  total  adjustment 
after  all  patterns  have  been  presented  once  is:- 

and       A.VO0  =  3  %-  Ca. 

The  adjustments  of  equation  1.1  are  made  if  Q:  is  not  equal 


*K 


to  0/^  o   Adjustments  could,  therefore,  continue  until  the 
analogue  error,  £:     ,  is  zero.   Hence  the  minimum  of  2E.£>. 

A  I'M 

found,  if  training  is  carried  on  for  long  enough,  will  be 

zero*  (See  also  Chapter  II) 

Three  aspects  of  equation  lc2  will  be  examined  in  a 

later  section: 

la   The  speed  with  which  the  error,  C^  ,  is  reduced  as  a 
succession  of  training  patterns  is  presented  and  the 
dependence  of  this  speed  of  convergence  on  Q  c 

2.  The  fact  that  the  error,  G^   ,  in  response  to  one 

of  the  \Y\    patterns,  may  be  too  large  even  though  ^.C. 
is  acceptably  small,  but  not  zero. 

3.  The  fact  that  the  constant,  Q   ,  can  take  on  values 
which  make  ^.^U  diverge <>   If,  the  constant,  Q  ,  takes 
the  value,  J^L^,   ,  where  W   is  the  number  of  weights  and 

•nfe.  is   a  constant,  Mays     \~f\  showed   that    HR*^<L   for 

convergence « 
1.2  Separability  of  Patterns 

Adaline  can  be  considered  to  be  the  realization  of  a 
linear  decision  function.   Input  patterns  can  be  regarded 
as  sets  of  points  in  H  space  and  a  linear  decision  function 
is  any  partitioning  of  the  space  by  a  hyperplane  of  dimension 

r\  —  I  .   The  pattern  recognition  problem  requires  the  selection 
of  a  set  of  weights  which  will  define  an  appropriate  hyperplane. 
In  an  adaption  scheme  which  uses  the  mean  square  error  adaption 
rules  the  weights  defining  a  separating  hyperplane  are  those 


which  cause  2.^,  and  the. analogue  errors,  &&    ,  for  each 

r  T4" 

pattern, |  XJ  •  ^  to  be  zero.   Hence  weights  must  be  chosen  to 


satisfy  the  following  equations: 


XnW,  +XlzVJ4+  +  XijWj  t  •••■  xmwn+w0  =  d, 

X21W,  tX^^H-  +X^Wj-v X^w^vaJ^cI, 


Xx,^i   +  XnV    +X^H* X^^+Wo-d; 


! 


The  iterative  training  Scheme  of  section  1.1,  if  carried  to 
its  conclusion,  will  yield  a  set  of  weights  defined  by  the 
above  equations  if  it  is  possible  to  map  the  patterns, 

DO,  •  [Atf  —  &L  • — Wm»  to  0UtPuts°f  d|t  iv  — 

flLi  , U.^  •  I?   it  is  possible  to  map  all  the  patterns,  with  which 

Adaline  is  trained,  to  the  desired  outputs  then  the  patterns 
are  said  to  be  separable.   After  training  has  been  completed 
with  a  set  of  separable  patterns,  the  values  found  for  the 

threshold  and  the  weights   are  fixed   at\Nfl   ,  VO,    ,  W^  , VVh 

and  the  equation  of  the  hyperplane  dividing  r\  space  is, 
therefore, 

X,Wf  +  Xt\iJj  + +  X;  vij  +  -'+XnW^  ■+  wt  ■  0       1.4 

where    X.     *  Xj  » X^are  the   coordinate  axes   of    ft   spacee 

6. 


If  the  h   th  pattern,  x|.»  with  coordinates,  /vr.  >/\,,- —  X/fl» 
in  H  space,  is  now  presented,  it  should  yield  an  analogue 
output, 

Ck  =  XxM  +X«wr+ +  Xi*W!  +W?    1.5 

which  is  approximately  equal  to  the  desired  output,  £>£  • 
Equation  1„5  is  the  equation  of  a  hyperplane  on  one  side  of 
the  dividing  hyperplane  given  by  equation  10^0   Hence  another 
definition  of  separability  is  that  a  set  of  patterns  can  be 
said  to  be  separable  if  one  group  of  them  lie  on  one  or  more 
parallel  hyperplanes  on  one  side  of,  and  parallel  to,  the 
dividing  hyperplane  and  the  other  group  lie  on  one  or  more 
parallel  hyperplanes  on  the  other  side  of,  and  parallel  to, 
the  dividing  hyperplane.   If  a  further  restriction  is  placed 
on  the  minimum  mean  square  error  adaption  scheme,  the  definition 
of  separability  is  even  simpler.   Consider  an  example  with  YY\ 
patterns  (as  described  in  equation  lo3)  where  /K.   of  them 
have  desired  outputs  of  +  d.  and  X/ of  them  have  desired  outputs 
of  -  d/  (and/|^+/C,  =ff\   •  )   If  the  patterns  are  separable,  then 

the  threshold  and  weights,  W0  ,  \Ai  ,  NN-  , Vvy\  $    can  be  found. 

Further,  after  training;  and  on  presenting  the  •'R.  patterns 
to  these  weights,  a  hyperplane  on  one  side  of  the  dividing 
hyperplane  is  defined,,   If  the  Aj    patterns  are  now  presented 
a  hyperplane  on  the  other  side  of  the  dividing  hyperplane 
is  defined,   feoth  hyperplanes  are  parallel  to  the  dividing 
hyperplane  and  equal  distances  from  it.  These  ideas  can 
be  clarified  if  specific  examples  in  2-space  are  considered,. 
In  2-space  Adaline  consists  of  two  weights,  Wj  and  W^  ,  and 
a  threshold, Wo«   *n  the  example  illustrated  in  Figure  ^  there 
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are  two  patterns: 
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From  equation  1.3  the  equations  defining  the  weights  required 
for  separation  are: 

W,  +  wx  +  Wo  =    d 

— W|      +  Wo.     +Wo     -   ~cl 
These   can  be   solved  only  by  choosing  a  value  for  one   of  the 

weights  and  solving  for  the  other  two«  The  equations  are 

consistent^  but  yield  an  infinite  number  of  dividing  lines 

which  pass  the  point,  K,"0  ,  ^2-=   " 


In  the  example  illustrated  in  Figure  5  t 
patterns: 
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^■The  rank  of  the  coefficient  matrix  is  equal  to  the  rank 
of  the  augmented  matrix „ 
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The  equations  defining  the  weights  required  for  separation 
are: 

wt  t  wz  +  Wo  -  4 

Wf  -  wz    +W0    =-d 


are 


These  can  be  solved  and  unique  values  of  Wo  »  Wi  ,  W^ 
found  to  be  W0   -  ~d  ,  \A^  -  d  ,  Wz-  «  .   Hence  the  equation  of 
the  dividing  line  is: 

The  two  parallel  hyperplanes  for  each  class  of  patterns  are 
indicated  in  Figurf  5« 

Two  inseparable  cases  are  now  considered.   In  the  case 
which  is  illustrated  in  Figure  6  there  are  four  patterns: 
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Figure  6 


The  four  equations  to  be  solved  for  the  weights  are: 


9. 


w,  +  wL  +  w0  =  d 


W,   -  Wt    +\a/0    =-c| 

-w,  +  wz  -+\a/0  -  -d 

-W,     ~W%     +W0    =  -4 

and  they  are  inconsistent  .   A  dividing  line  cannot  be  placed 

between  them  and  satisfy  the  given  conditions. 

Another  inseparable  case  is  shown  in  Figure  7°  There 


are  four  patterns: 
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Figure  7 
The  resulting  four  equations  are  inconsistent  and  cannot 

be  solved  for  the  weights.   It  is  obvious,  from  Figure  7, 

that  one  line  cannot  separate  the  patterns 0 

It  should  be  emphasised  that  the  examples  discussed  use 

minimum  mean  square  error  adaption.   Since  this  scheme  requires 


"The  rank  of  the  coefficient  matrix  is  different  from 
the  rank  of  the  augmented  matrix. 
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convergence  to  a  desired  value  of  output  (with  the  accompanying 
precise  positioning  of  the  hyperplane)  it  makes  complete 
separation  of  apparently  separable  patterns  (such  as  shown 
in  Figure  6)  impossible.  Other  "less  precise"  schemes,  as 
discussed  by  Treado[_6j  «  do  not  encounter  some  of  these 
difficulties. 

The  problem  of  separability  will  be  mentioned  again  in 
Chapter  III  where  coding  is  considered. 
1.3  Experimental  Simulation  of  Adaline 

A  computer  program  was  written  to  simulate  Adaline  and  to  test 
its  ability  to  separate  pattern  sets.  Two  pattern  sets  were 
applied!   a  group  of  five  '(^s  was  to  produce  an  output  of 
+10  and  a  group  of  five  'T's  to  produce  an  output  of  -10o 
The  patterns  and  resultant  inputs  to  Adaline  are  shown  in 
Figure  8.  During  the  training  phase  one  of  the  patterns  was 
presented  and  an  appropriate  adjustment  made  to  the  weights . 
Each  of  the  ten  patterns  was  then  presented  and  the  error,  &i    , 
in  each  case  was  measured,  rr^   was  then  calculated.   This 
process  was  carried  out  100  times.   The  value  of  the  constant, 

3,  in  equation  1.2  was  initially  chosed  to  be  0.05. 
A  graph  of  Z.^   against  number  of  adaptions  is  shown 
in  Figure  9«  This  has  been  called  a  "learning  curve"  by 
Widrow  \S\   •   It  indicates  how  many  presentations  of  the 
patterns  are  necessary  before  Adaline  is  able  to  recognize 
all  of  the  patterns  with  small  error.   From  the  learning 
curve  shown  it  can  be  seen  that  the  quantity,  4e<  ,  has 

11. 


Patterns  mapped  to  +10 


XXX. 

.XXX 

X  X  X  X 

,    X  X  X 

9            .            .            • 

A     .      •      • 

.    A     •      . 

A     »      •     A 

0         •         o       A. 

A     .      .     A 

A      9       •       • 

.    A     .      . 

A      a       a      A 

o       •       •      -A 

A    .     .    A 

XXX. 

o     A    A    A 

o        .        .        • 

.XXX 

X  X  X  X 

(1)         (2)         (3)         CO         (5) 

(  1  )  lll-ll-l-l-ll-l-l-llll-l 

(  2  )  -1  1  1  1  -1  1  -1  -1  -1  1  -1  -1  -1  1  1  1 
(  3  )  11111-1-111-1-11-1-1-1-1 

(O  -1111.-1-1-11-1-1-11411  1 

(5  )  -1  -1  -l-ll-l-lll-l-lliiii 


Patterns  mapped  to  -10 


A    A    A     « 

.... 

•         •         •         0 

«         «         •         a 

.99. 

.         A         .           9 

.     A     .      . 

.XXX 

o     •     •    A 

A     .      .      . 

.     A     .      . 

.     A     .      . 

•        o       A       o 

*    X  X  X 

X  X  X    . 

.... 

XXX. 

•      •     A     • 

.        .        9       A 

A          9           .           . 

(6)         (7)         (8)         (9)         (10) 


(  6  )  lll-1-ll-l-l-ll-l-l-l  -l  -l  -l 

(  7  )  -1  -1  -1  -1  -1  1-1-1-11-1-1111  -1 

(  8  )  -1  -l  -1  -l  -1  1  l  1  -l  -l  l  -l  -l  -l  l  -l 

(  9  )  -1  -1  -1  -1  -1-1-11-1111  -1  rl  -1  1 

(10)  -1  -1  -1  -11-1-1-1111-11  -1  -1  -1 

Figure  8 
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Figure  9(a) 
Learning. Curve  for  Q  =  0.05 
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dropped  rapidly,  after  20  adaptions,  from  the  large  Initial 

value  of  1000  to  a  value  of  4?.  This  might  seem  acceptable 

but  the  use  of  Z.S.  as  a  criterion  is  dangerous  since  it  may 

be  generated  almost  entirely  by  one  or  two  unacceptably  large 

errors.  Training  can  be  considered  complete  only  when  all 

the  errors  are  zero  or,  If  this  requires  too  many  adaptions, 

when  the  error  for  each  pattern  lies  within  an  acceptable 

limit. 

The  five  ,C,s  and  fivers  were  used  again  as  test 

patterns  to  find  the  effect  of  the  constant,  3  ,  on  the  rate 

adaption.  Values  of  4-Si.  »  after  various  numbers  of  iterations, 

plotted  against  Q  are  shown  in  Figure  10.   The  "best"  value 

°  to.  ^ 

of  0^     ,  that  producing  the  smallest  ^^  ,  after  a  fixed  number 

of  iterations,  is  found  to  be  ^  0.06.   If  Q-  y  ,  where  ^  is 

a -constant  and  ft  is  the  number  of  weights,  the  best  value 

of  <$.    is  1.02  for  an  Adaline  with  16  weights. 

Learning  curves  for  other  values  of   Q   are  shown  in 

Figures  11  and  12.   For  small  values  of   9   the  adjustments 

It   i  ° 

are  small  and  X^L  *s  reduced  with  few  fluctuations.   When 


Q  approaches  the  maximum  permissable  value,  the  adjustments 
are  large  and  2L&k,     experiences  large  fluctuations  before  a 


suitably  low  value  is  approached.  For  values  of  Q  greater 

a*  O 

than  0.1167  the  adjustments  are  too  large  and  2eX  diverges. 
The  problem  of  large  analogue  errors  "concealed1*  by  what 
seems  an  acceptably  low  ^C^   ^as  ^een   mentioned.   In  all 
cases  examined,  however,  no  individual  error  was  ever 
prohibitively  large.   Figure  13  is  a  table  of  values  of 
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Figure  10 
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Figure  11 
Learning  Curve  for    Cj  =0.003 
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analogue  error  corresponding  to  a  value  of^cr^  for  various 
values  of   Q   and  various  numbers  of  iterations. 
100  ITERATIONS 


3 

< 

Ah/ALOGUE     EZ.K0R       e^ 

C1*         -HO 

-r^      -10 

0.01 

24.27 

2.01 

\.%5 

2.45 

-0,13 

-2.13 

o.*7 

-1.85 

-o.go 

O.ll 

l-ol 

0.06 

10.93 

|.3<> 

l.<W 

1.31 

-1.18 

-o$i 

-O.07 

-0.56 

0.43 

0.41 

0,03 

o.oi 

43.B2 

0.58 

4*1 

0.10 

-2.11 

1.33 

-1.74 

L01 

0.03 

-ZOZ 

-0,13 

SO 

lT£Kf\T\ONS 

s 

Zel< 

c 

^'s     +10 

Tls    -  IO 

O.oi 

45.34 

-21s 

3,23 

-0.3^ 

2.25" 

-0.84 

-1.12 

I.4I 

3.17 

I ,-53 

-0,156 

D.0G 

\U0 

1.33 

3,53 

\,0B 

-1,11 

-0,38 

0.34 

-0,12 

0.51 

0.S8 

0.037 

0.0 1 

76.21 

Uo 

6,27 

1.41 

-I.IS 

274 

-3.48 

-0,01 

-1,52 

-3.03 

-1.21 

Figure  13 
If  training  were  continued  indefinitely,  the  errors,  6^  and 


£<£    ,  could,  presumably,  be  brought  close  to  zero.  This  is 
hardly  practical,  however,  and  a  more  realistic  approach  is 
to  limit  the  number  of  iterations  and  acceDt  an  analogue  error., 
Here  the  concept  of  a  H dead  zone"  could  be  introducedo  Thus, 
if  the  desired  value  of  output  is  +10,  a  tolerance  is  placed 
on  this  value  so  that  if,  after  training,  a  pattern  produces 
an  output  of,  say,  +10  +  3  it  would  be  classed  in  the  +10 
group.   In  addition  to  making  an  excessive  number  of  training 
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Iterations  unnecessary  this  may  enable  Adaline  to  recognize 
patterns  which  it  has  not  seen  during  training—such  as  a 
training  pattern  which  has  been  contaminated  by  noise «   If 
Adaline  is  used  with,  threshold  device,  two  modifications 
could  be  made.   Consider  the  Adaline  shown  in  Figure  1*K 

-V! 


Figure  1^ 
The  threshold  device  yields  an  output  of  +1  if  its  input  is 
greater  than  +7  and  an  output  of  -1  if  its  input  is  more 
negative  than  -7«   Only  the  lower  half  of  the  dead  zone,  or 
tolerance,  is  used  herec   An  analogue  input  of  10  to  the 
threshold  element  is  aimed  at  but,  in  this  case,  if  values 
of  output,  (Xj    ,  are  greater  than  10,  adaption  to  reduce 
this  to  10  would  seem  pointless «,   A  new  training  scheme  would 
be  to  adjust  the  weights  (by  the  old  rule)  to  make  the  output 
approach  10  from  below «   If  the  output  Is  greater  than  10 
no  adaption  should  take  place c 
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CHAPTER  II 
A  NOTE  ON  MINIMUM  SQUARE  ERROR  ADAPTION 


There  are  many  methods  of  approximating  a  polynomial  by 
a  straight  line  o?   curve.   One  common  method  is  described  by 
texts  on  numerical  analysis   £  8  J   and  receives  the  name 
'•minimum  square  error  approximation* "   If  the  curve  shown  in 
Figure  15  is  a  polynomial,  pM,  and  is  approximated  by  a 
straight  line,  X(X)~&)(  +b  ,  then  the  constants,  (X  and  D  , 
can  be  found  for  which  ^(X)  satisfies  the  minimum  mean  square 
error  criterion,   i.e.  The  sum,  2.6^.  *  ^,  ***  ^l  "*"'"*  6.*,  "^'^fi^  » 
where   £S  -  P(Xjl)  -  <£(XyU   »  must  be  minimized.   If  p  s  ^e.2* 
then  the  values  of  CL  and  ]o     which  minimize  fVcan  be  found 
using  the  techniques  of  differential  calculus. 


x 


x. 


/M 


Figure  15 

Consider  the  problem  of  training  Adaline.   Patterns,  [Xj.  » 
F)CJ  , |X 1  ,  have  desired  outputs  of  d  ,  cL^  , (j^ 
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At  any  point  during  the  training  phase,  when  the  output 


of  Adaline  in  response  to  these  patterns  is  d.   ,  CL, C(yy, 


the  errors  are  C,  ,  C^  , Sm  .   Adaption  requires  the 

adjustment  of  the  threshold  and  weights  in  such  a  way  as  to 

make  Q.  >  (L  f- — >  &y^  approach  CL,  ,  CL^S  —  C(^0   By  carrying 

over  the  concepts  of  polynomial  approximation  the  sum  of  the 

squares  of  these  errors  can  be  found  and,  by  using  this  as 

m  1. 
a  criterion,  the  weights  can  be  found  which  minimize  ZL^Li,  ° 

w*  -  xc  =l 

2d   Analytical  Minimization  of  fL  6^ 

Before  training  Adaline  is  assumed  to  have  the  threshold 

and  weights  of  value,  We  ,  W.  ,  W^  ,—  -  \AA  ,—  -  Wn  o   If 

the  W\  training  patterns  are  now  presented  the  outputs  are 

given  by  the  following  equations: 

a,  =  X„W,  +  Xl2wz-*--+x,^w^+ +xlnwn  +w0 

!      !       ;  I  \  i       ! 

a™  =  Xw»w,  +  XfcLwi+<-+^wj  +  -+ywin\wh.  +  w„ 


The  expressions  for  the  errors  ares    £  n  (A.  ""  £t, 


^  =  cl^.^&o. 


1 
The  sum, xfi?  ♦  ls  formed  and  this  must  be  minimized  by 


adjusting  the  threshold  and  the  weights.   If  2^vCls  called5» 
then  A    /  i      *"  /  \      \?" 

Analytically,  the  values  of  the  threshold  and  weights  which 
yield  the  minimum  value  of  5  can  be  found  by  taking  the 
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partial  derivatives  of  O  with  respect  to  the  threshold  and 

weights  and  equating  the  derivatives  to  zero. 

i.e. 

41  =  0 

£S    =0 

aw,  , 

aw. 

There  are  A?  "/"  /   equations  and  A)f"/  unknowns  and,  if  these 
equations  are  consistent,  they  can  be  solved  for  W     ,  U/  , 

W»    , \n/    — the  values  of  threshold  and  weights  which  produce 

minimum  5  .  The  equations  are: 

1  j [   '       >       t         1       1   '      1 

i  1  i  1       1 

They  can  be  expressed  compactly,  with  the  A  th  equation  given 

by'   ^%\aJjs°     orj|fi*X^=0     ™i"'<»3*i,n 

and      ^Wo*0       or-    jS.e.f    «0  for     xj   =  O 

It  is  instructive  to  consider  a  few  examples  of  this 
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technique  as  applied  to  patterns  in  two  space.  For  such  patterns 
there  will  be  three  equations,  with  three  unknowns,  to  be 
used  to  find  the  best  weights,,  i.e0 


w 


2e.  =   o 


i.-\ 


£exXi,  =  o 


In  the  example  illustrated  in  Figure  16  the  patterns  are: 


W 


[x\ 


x„=  i 

XI2.S" 


cUd 


^ 


-/ 


a2--a 


i 


\ 


Figure  16 
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© 


o 
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The  errors  are  given  by: 

e,  =    d  -  w^  +wL  -  w 

and  the  three  equations  are: 

Wo  =  O 

w,  -  Wt  *  cl 
w,  -  Wt  -  d 


These  equations  yield  an  infinite  number  of  solutions  for 

the  weights  so  that  the  weights  define  an  infinite  number 

f 
of  hyperplanes  which  pass  through  the  point X^O^X-fQ, If  W2  is 
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chosen  to  be  Q   »  then  W^   =  2d  and  the  hyperplane  has 
equation,   /\«  ="itX|«   This  is  the  same  result  as  was  obtained 
in  example  1  in  section  1,2. 

In  the  example  shown  in  Figure  17  the  patterns  are: 


Xti-» 


d,«d 


M 


-/ 


Figure  17 


0 


i    3 

X»'  l 

The  errors  are  given  by: 

e,  =    d  -  w,  -  Wx  -Wo 

e*.  =-c\  -  w,  +  w^  -  Wo 

e3  =  -d  +  w»  -  wi  ~Wo 

and  the -equations  to  be  solved  for  the  weights  and  threshold 

W,  +  Wt   t3W0  s  -d 

3w,  -  Wi  +   Wo    =  d 

\A/t    —  3Wt    +    Wo     =-^ 

Solution  yieldst  \fjj  s-d  ,  wf  =  d  j  W2  =  cL  and 
hence,  the  equation  of  the  dividing  hyperplane  isX^l  —  Xj. 
Again  this  agrees  with  the  results  of  example  2  in  1.2. 
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In  the  example  shown  in  Figure  18  the  patterns  are: 


fx],    *..= 

[x]    x 


*x 


W 


M 


2    *l  "' 
X2Z  =~l 

^31  =" 


x3i=  i 

x^s 


d,=d 


clj_=  -cl 


X 


© 


d3*- d 


d>--t 


X 


^ &> 

X 


X 


Figure  18 


The  equations  for  the  errors  ares 


e(  =  cl  -  w,  -  wa  -w0 
ez  =  -c]  -  w,  +Wt  -Wo 

e3     =  -d  +  W,  -  Wt.  -Wo 

e^   = -d  +W|  +w1-w0 

and  the  three  equations  to  be  solved  for  the  weights  ares 

4  W0  +  2d  •  o 
4W,  -id  =0 

and  so  W0  =  -cl/^  ,   W  =  ^/z  »  W2  =    /z  *   The  ecluation 
of  the  hyperplane  which  separates  the  two  classes  of  patterns 
is  A2  =  "~^|  +  I    ■   This  apparent  contradiction  of  the 
result  of  example  3  of  section  1.2  can  be  explained  in  the 
following  way.   Minimizing  O  analytically  results  in  finding 
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the  particular  hyperplane  which  will  separate  the  patterns 
into  two  classes  in  such  a  way  as  to  minimize  sC>x   °   The 
equations  of  section  1.2  describe  an  adaptive  scheme  which  will 
separate  patterns  with  a  precisely  located  hyperplane  only  if  the 
error,  "j^  ,  for  each  pattern  and  /^l.  can  be  reduced  to  zero. 


^-t 


Finally,  in  the  example  shown  in  Figure  19  the  patterns  are.: 


M, 


M2  *.-! 
2   x- 


cU=cl 


x 


4XZ 
/ 


Ol 


>22 


All     —  ~ 


—I 


•31 


3  ,  H3 

X32^     ^ 


0 


w 


x4l  - 

X42  = 


ii.--cl 


**/ 


--/ 


Figure   19 


The  three  equation  for  the  weights  yield  W  =  W^  =  WQ  =  O  • 
This  means,  In  effect,  that  no  hyperplane  exists  which  will 
separate  the  patterns.   -^his  agrees  with  the  result  of  example  4- 
in  section  1.2. 
2.2  Steepest  Descent  Minimization  of  ^-^, 

If  the  function,  o  =  (£l «  has  a  minimum  it  can  be  found 


.=  i 


by  an  iterative  procedure,.   One  possible  method  is  described  here. 
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If  the  threshold  and  weights  have  initial  values,  \A/0  ,  V^.  , 

Wo  ♦ Wft  ♦  and  these  weights  do  not  define  the  minimum, 

then  all  the  patterns,  Q<3  »  [)C]i  .  —  [Xjj  f— -£x]w  are 
presented.   The  function,  Q  »  and  the  lTl"H  partial  derivatives 
are  calculated.   The  method  of  steepest  descent  Involves 
evaluating  the  magnitude  of  the  gradient  vector  at  the  point 

defined  by  VVq  ,  W. ,  Wi  » ^h, Wy»  and  descending  a  short 

distance  along  the  gradient  vector  towards  the  minimum  of 

by  adjusting  the  weights,  Wo  »  Wi  , W^  .   The  adjustment 

made  to  the  A  th  weight  is  proportional  todSAwjand  lsAWi—  QZ^X^i 
The  adjustment  to  the  threshold  is  ^VJ0  -Q^£^  .  Q   Is  a 
constant  of  proportionality  and  affects  the  size  of  the  increments 
to  all  the  weights  and  the  threshold.   It  should  be  noted 

that   the  increments,  AV\)0  ,  AW(  , AW;  , AW^  are  not 

necessarily  equal.   At  each  new  set  of  values,  Wq  »  Wi  , 

\K/;  , Wki  the  patterns  are  again  presented  and  the  process 

is  repeated.  This  process  is  continued  until  the  minimum 
is  found.   A  memory  is  required  to  Implement  this  process. 
After  presenting  each  pattern  the  error,  C*;  ,  and  the  ft  elements 
of  the  X- th  pattern  must  be  stored  until  all  the  patterns 
are  presented  so  that  O  ,  AW0and  ij\Ajl  can  be  calculated.. 
The  calculation  ofAWo  ♦  *^vl  could  be  accomplished  using 

ft+|  totaling  registers  to  collect  ^e^  »2^ai» aSg;)^ 

It  is  for  this  reason  that  a  modified  form  of  steepest  descent 
(which  does  not  require  a  memory)  is  used  when  training  Adaline 
to  find  the  values  of  weights  which  yield  a  minimum  for  O  . 
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2.3  Modified  Steepest  Descent 

The  sum  of  the  Squares  of  the  errors, o  »  is  given  by: 

S  =  %A 


The  adaptive  scheme  described  in  this  section  attempts  to 
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find  the  weights  which  yield  the  minimum  for  O  without  the 
need  for  a  memory.  The  first  pattern  is  presented  and  O. 
is  calculated*   This  is  treated  as  the  function  to  be  minimized 
and  the  steepest  descent  method  is  applied  to  this.   Calling 

'.ves  ares 


f,  =  C(  =(d-Q,V  the  derivati1 

;  /I      N 


It  should  be  noted  that  the  derivatives  are  equal  in  magnitude. 
As  was  mentioned  in  section  2*2,  the  method  of  steepest  descent 
involves  making  an  adjustment,  which  is  proportional  to   'Av\J;» 
to  each  of  the  weights*  i.e.AVvi  =  Q6,X|n, .  and  the  adjustment 
to  the  threshold  is  A\M0  =  Q£  ,  where   Q  is  a  constant. 

The  adjustments,  &\A/0 »  AW,  , ^Vv^,  are  equal  in  magnitude0 

These  adjustments  are  made  and  then  the  second  pattern  is 

applied  and   TV  =  £*  is  calculated,,   Equal  adjustments  are 

then  made  by  applying  one  step  of  the  steepest  descent  proceedure 

to  T*  .   This  process  is  repeated  until  the  YX\     patterns 

have  all  been  been  presented*   If  the  minimum  has  not  been 

found  presentation  of  the  patterns  and  adjustment  continues  as 

described*   The  "real"  steepest  descent  method  uses  the  function, 5, 
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as  the  criterion.   The  criterion  for  the  modified  method, 

however,  changes  from  \\     to  T^ to  T^ toT^.   If  the 

constant,  ^     ,  is  such  that  the  adjustments  are  very  small 

then  O  %>    ' i  +  ' 2  + 'v*  »  where  \\    =  &.   and  is  evaluated 

after^-1  adjustments-*.   The  minimum  of  5  can  De  found 
by  continuing  to  apply  this  method.   In  many  cases  the  true 
minimum  (as  obtained  by  the  analytical  method  of  examples 
1  and  2  of  section  2.1)  is  found  but  in  some  cases  (as  in 
example  3  of  section  2.1)  no  hyaerplane  defining  the 
minimum  is  found.   The  hyperplane  oscillates  around  a  mean 
plane  which  would  define  the  minimum  of  O • 

The  adaption  scheme  can  be  defined  as  follows: 

Present  the  patterns  in  turn  and,  after  each  pattern 
is  presented*  adjust  all  of  the  weights  and  the 
threshold  by  an  equal  amount  in  a  direction  such 
that  the  error  will  be  reduced.   Adjustment  is  to 
take  place  of  the  error,  on  application  of  a  pattern, 
is  not  zero. 
This  scheme  is  most  frequently  called  "Minimum  Square  Error 
Adaption"  in  the  literature.   The  main  difference  between 
"Minimum  Square  Error  Adaption"  and  the  method  of  steepest  descent 
is  that  in  the  former  adjustment  is  made  after  each  pattern  is 
presented  whereas  in  the  latter,  all  the  patterns  are  presented 
before  any  adjustment  can  be  made. 


*   In  "real"  steepest  descent  S  would  be  given  by  «S="f^  +  £  "+,iU*fv> 

where  -f1  ,  "f£  , «f^  were  evaluated  before  any 

adjustment  was  made. 
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CHAPTER  III 
ADALINE  IN  CONTROL  OF  A  PLANT 


3d  The  Equations  of  the  Plant 

In  this  chapter  Adaline  is  taught  to  recognize  the 
behaviour  of  a  stable  plant  and  to  produce  the  correct  control 
signal.,  The  plant  chosen  is  a  second  order  system  which 
consists  of  a  motor  and  a  load.   The  power  supplied  to  the 
motor  is  controlled  by  a  relay0   The  system  is  stable  by 
virtue  of  unity  and  velocity  feedback.,   It  is  shown  in  Figure  19 
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Figure  19 
The  differential  equations  of  the  plant  can  be  obtained, 
as  a  function  of  time,  from  the  block  diagrara0   Hence  the 
output,  Crt),  and  the  output  rate,  C(Ty  ,  can  be  calculated, 
as  functions  of  time,  for  various  initial  values  of  £(X j 
and  C("t)  — rather  than  supplying  the  plant  with  various  desired 
outputs,  r(t).   The  voltage  which  is  applied  to  the  plant 
is.+  Vvolt»  depending  on  the  sign  of  the  relay  input ,d(XJ0 
From  Figure  19, 

d(t)  =  rfct)  -  c(t}  -  Kt  c(t)  3.1 
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and  choosing  to  set  i  (J-J  =  0  ,  the  switching  condition  for 
the  voltage, V  »  is  given  by: 

Ut)      =  -  J-C(t)  3.2 

The  equations  for  output  and  output  rate  are: 

C(t)   =  Kt  +CCo+C,-k^  +(K-C1)e"      3.3 

c(t)-  K  -  b(K-^,s)e"kt  3.4 

where  C0  =  C(o)  the  initial  value  of  C(t  J  at  t  =  O     , 
C.  =  C(°)the  initial  value  of  C.(t^  at  t  =  O     , 

and  K   =±K<V  • 

A  convenient  way  of  presenting  this  information  is  to  use 
the  phase  plane.   If  the  system  error,  S(>) ,  is  defined  as 

C(tj  =  |f("t)—  C(t,}  ,  where  iT^t)  is  the  desired  output 
and  C^u)  the  actual  output,  then  a  plot  of  error,  C(^t) , 
versus  error  rate,e(t),  U  called  a  trajectory  In  the  phase 
plane.   Since  "inputs'*  are  applied  to  the  system  as  initial 
conditions  of  C{tj  ,  d(t )  with  |r(t)  =  0  then  S»(t)  =  —  C(X)   , 
£  it)  =  """  C  (t )   .   Hence  equations  3-3  and  3»^  define  a 
trajectory  in  the  phase  plane  which  describes  the  system 
behaviour  for  a  given  set  of  initial  conditions.   Since  |\  can 
be  positive  or  negative,  depending  on  the  sign  of  the  input 
to  the  relay, d(~^J  »  the  trajectory  obeys  two  equations,  A  and 
B.  $his  %s   indicated  In  Figure  20.   The  switching  condition 
has  been  stated,  in  the  equation  3»2.  This  is  a  line  with 
a  slope,  /V  ,  in  the  phase  plane.  The  values  chosen  for 
the  control  system  were:    |\^  =0.2,    i\  =  +10  ,  and 

b  -i. 
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Trajectory  3,  K<o 
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Figure  20 
Using  the  analytical  solutions  for  C("C)  and  C(X)  a  computer 
program  was  constructed  to  produce  sample  trajectories  in 
the  phase  plane  for  comparison  with  the  attempts  produced 
by  Adaline.   One  of  them  is  shown  in  Figure  21 0 

The  objective  of  training  Adaline  is  to  enable  it  to 
recognize  combinations  of  £(t)  and  C(j*)  in  order  that  it 
may  produce  a  response,  Q,(t J  ,  which  is  close  to  the  correct 
input  to  fcae  relay,  u(ty  »   During  training  it  is  desirable 
to  present  many  combinations  of  Mjvj  and  ^U*J ,  so  that  when 
Adaline  later  acts  as  a  controller  it  will  have  enough 
"experience"  (or  have  seen  most  of  the  phase  plane)  to 

make  a  correct  decision  and  produce  a  good  value  of  &(jtj 
to  drive  the  relay.  To  train  Adaline  in  a  practical 
situation  various  Initial  conditions,  £(0J  andC(,0J»  would 
be  set  and  the  resulting  trajectories  would  generate  values 
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of  (S(X)  ,  C(^t  J  and  CL(,X)  ,  which  would  be  presented  to 
Adaline.  This  Is  illustrated  in  Figure  22 .   It  is  unnecessary 


^  \  Initial 
c(°)J  Condition 


Figure  22 

to  follow  this  procedure  exactly  in  a  digital  computer 
slmulation«   Since  it  is  known  that  the  input  to  relay,  d(,"t)  , 
is  given  by  0(t,J=  C ("t )  4*  |s£  ^. ^"t  j       everywhere  in  the  phase 
plane,  Adaline  is,  in  this  study,  presented  with  a  large 
number  of  randomly  selected  points  with  coordinates,  £  [jz) 
and  G\JC),  and  is  trained  to  produce  an  analogue  output,  0*("t)  » 
of  value  as  close  as  possible  to  u["t.)  f or  these  polntSo 
3.2  Coding  of  the  Phase  Plane  and  the  Training  of  Adaline 

A  practical  Adaline  is  limited  in  size  by  having  a 
finite  number  of  weights  and  the  number  of  weights  limits 
the  number  of  patterns  which  Adaline  can  attempt  to  separate^  ~\, 
When  the  control  system  responds  to  a  set  of  initial  conditions, 
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the  error,  6(t J  ,  and  error  rate,^.(jt.)  ,  vary  continuously 
until  a  steady  state  condition  is  reachedo   If  the  values 
of  £(£)  ,  ^C.*-*)  and  ^l£)  are  sampled  frequently,  patterns 
could  be  assigned  to  the  points  with  coordinates,  £-(>) 
and  G»l> ) .,  in  the  phase  plane „  This  would  yield  a  very 
large  number  of  patterns  and  an  Adaline  with  a  large  number 
of  weights  would  be  required*   Although,  in  this  study,  the 
values  of  &\£)  andCi([Cj  were  sampled  frequently,  the  above 
situation  was  avoided  by  dividing  the  phase  plane  into  a 
small  number  of  regions  to  which  certain  patterns  were 
assigned.  This  ensures  that  only  a  small  number  of  patterns 
will  be  presented  to  Adaline.   It  was  also  necessary  to 
choose  the  patterns,  or  codes,  in  such  a  way  that  it  will  be 
possible  for  Adaline  to  separate  groups  of  the  patterns  into 
the  appropriate  two  classes.   If  this  last  requirement  is  met, 
it  is  said  that  the  patterns  are  linearly  separable 0 

A  linearly  separable  code  which  has  been  used  by  Widrow  L^J 
is  indicated  in  Figure  23 »  The  variable,  X  ,  which  can  take 
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on  an  Infinite  number  of  values  between  Qi   and  T  , 
is  coded  by  choosing  a  reference  or  origin  at  <X  on  the  X  axis 
and  dividing  the  region  to  the  right  of  this  reference  into 
segments o  The  total  number  of  segments  determines  the  number 
of  digits  in  the  codec   The  code  assigned  to  each  region  is 
obtained  by  moving  the  digit  1  one  space  to  the  left  as  the 
variable,  X  ,  moves  from  one  segment  to  anotherQ  This  ensures 
that  there  is  a  difference  of  two  digits  in  the  code  between 
any  two  regions— which  may  be  a  factor  in  explaining  the 
separability  of  the  code  (Treado  [6]  ) 0 

The  phase  plane  can  now  be  divided  into  squares,  for 
example,  and  a  pattern  assigned  to  each  square.   One  method  of 
coding  ©Ct)  and  £(£/  separately  has  been  discussedo   If  this 
were  done,  a  group  of  points  in  a  square  in  the  phase  plane 
could  have  a  coded  value  for  ^\X) ,  and  a  coded  value  for  GL"M  • 
A  pattern  which  could  be  associated  with  points  lying  within 
a  given  square  in  the  phase  plane  can  be  obtained  by  placing  the 
coded  values  for  £(t)and  C(]t)  side  by  slde0  This  means  that 
if  C(,"t)  has  a  value  such  that  it  is  in  segment  0001,  and  ©C"fc/ 
has  a  value  such  that  it  is  in  segment  0100,  the  pattern  for 
the  point  with  coordinates,  C(t)  ,  C(."t)  in  the  given  square 
is  fe£l  or  00010100c 

In  all  the  tests  described  here  the  possible  initial 
conditions  for  the  control  system  were  limited  so  that  the 
maximum  absolute  value  of  &$.)  and  £(£)  was  10o   This  section 
of  the  phase  plane  was  then  divided  into  squares 0   By  choosing 
a  reference  at  £&)   = -10  ,  <*&)  =  ~lO  ,  e(t)ande(t)  were 
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sult&bly  coded,  and  hence  a  pattern  was  assigned  to  each 
square.   A  division  of  the  phase  plane  into  25  squares  is 
shown  in  Figure  2*K  The  fineness  with  which  the  phase  plane 
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Figure  2k 
is  divided  determines  the  number  of  digits  per  pattern  and 
this  determines  the  number  of  weights  in  the  Adaline 0 
If  the  £.(,t)  and  £(jt»)  axes  are  divided  into  H  sections 
each,  then  the  pattern  for  each  square  has  &JX\   digits .   It 
can  be  seen  that  the  size  of  Adaline  increases  linearly  with 
the  fineness  of  the  division,.   The  more  alarming  feature  is 
that  the  number  of  patterns  increases  with  the  square  of 
the  sections,  \\      ,  and  the  question  arises  as  to  whether 
Adaline  is  capable  of  separating  H   patterns  into  distinct 
classes  when  the  number  of  weights  is  d-T\  0 

By  using  various  sizes  of  grids  on  the  6(jty,  6("t) 
plane,  several  Adalines  were  trained  (according  to  the  Mean 
Square  Error  scheme  of  Chapter  I)  to  "duplicate"  the  performance 
of  the  control  system  of  section  3°1°   As  in  Chapter  I,  each 
of  the  zeros  in  the  patterns  was  replaced  by  the  digit  -1. 
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A  pseudo-ramdom  number  generator  was  used  to  produce  values 
of  £(>)  and  £(£)  which  were  then  coded „  The  function,  0(^.)  » 
was  calculated  from  equation  3°1  and,  together  with  the 
coded  values  of  ^(,"t)  and  G*(fc)  »  presented  to  Adaline  <> 
After  each  pattern  was  presented  the  weights  were  adjusted 
and  the  next  pattern  read  inc   After  a  suitable  training 
period  the  weights  were  fixed  and  Adaline  was  tested  with 
one  pattern  from  each  square  of  the  grido 

The  output  of  the  trained  Adaline,  &(>)  ,  is  positive  or 
negative—depending  on  which  pattern  is  presented „   If  a  line 
is  drawn  in  the  phase  plane  separating  squares  for  which  O^) 
is  positive  from  those  with  negative  (X(t.J  ,  this  can  be 
called  the  switching  line  which  Adaline  has  produced,  and 
the  training  of  Adaline  can  be  considered  as  the  training  of 
a  function  generator.) 

Several  points  were  examined  using  the  scheme  which 
has  been  outlined: 

1„  The  effect  of  the  adjustment  constant, Q  ,  on 
the  switching  line  which  is  obtained  when  the 
training  process  is  complete* 

2«   The  effect  of  the  number  of  training  cycles  on 
the  final  switching  linec 

3o   Convergence  of  values  of  weights  and  threshold 
as  training  proceeds „ 

k0      The  capacity  of  Adaline c 
These  four  points  were  examined  using  a  digital  computer 
simulation  of  the  coding  scheme  and  Adaline  as  described 
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in  the  previous  paragraph  <>   A  training  cycle  consists  of 
fixing  the  adjustment  constant,  Q  ,  and  presenting 
Adaline  with  patterns  generated  by  a  fixed  number  of 
points  in  the  phase  planeQ   The  training  cycle  is  repeated 
for  various  numbers  of  points  and  various  Q 

Assuming  that  Adaline  is  trained  with  a  very  large 
number  of  randomly  distributed  points  In  the  phase  plane 
the  switching  line  which  Adaline  will  produce,  when  trained, 
can  be  predicted,. 

For  a  grid  of  16  squares,  and  hence  an  Adaline  of 
eight  weights  and  a  threshold,  Adaline1 s  attempt  at 
duplicating  a  switching  line  of  slope,  =5>  is  shown 


in  Figure  25-   For  points  occurring  in  all  the  squares, 
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Figure  25 
except  those  numbered  2t  6,  11  and  15,  the  input  to  the 
relay,  d[t,J  ,  is  either  positive  for  all  points  in  a  square 

or  negative  for  all  points  in  the  square 0  There  is  no 
difficulty  in  training  Adaline  to  differentiate  between 
such  squares o   Points  in  squares  2,  6,  11  and  15  can9 
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however,  present  Adaline  with  conflicting  information 
during  the  training  phase .   In  square  2,  for  example,  all 
points,  when  coded,  yield  the  same  pattern,  00101000.   Since 
the  switching  line  passes  through  this  square  the  points,  and 
hence  the  pattern,  can  correspond  to  both  positive  and 
negative  values  of  cl(j-/ °   Adaline  has,  therefore,  the  problem 
of  placing  the  same  pattern  into  two  different  classes 0  Using 
the  assumption  of  a  statistically  uniform  distribution  of 
points  it  can  be  seen  that  Adaline  will  see  more  values  of 
negative  d(t)  than  positive  Q\y)   during  trainingc 

After  training,  therefore,  the  weights  and  threshold 
will  have  values  such  that  the  pattern  of  square  2  is 
classed  negative,,   The  same  argument  applies  to  squares  6,  11 
and  15o   It  Is  on  this  basis  that  a  prediction  of  Adaline' s 
attempt  at  producing  a  switching  line  is  made0 

For  a  grid  of  25  squares,  and  an  Adaline  with  ten 
weights  and  a  threshold,  the  prediction  of  the  switching 
line  produced,  after  training,  is  shown  in  Figure  26, 
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In  this  case  the  weights  and  threshold,  after  training, 
should  be  such  that  squares  3  and  8  will  yield  positive 
output  and  squares  18  and  23  negative  output .  The  output 
of  Adaline  in  response  to  a  pattern  from  square  13  is  uncertain, 
Of  the  patterns  presented  to  Adaline  from  square  13 »  statistic- 
ally, half  of  them  would  correspond  to  positive  values  of  d(t) 
and  half  of  them  to  negative  values  of  «(t)0 

For  a  grid  of  J6   squares  and  an  Adaline  of  12  weights 
and  one  threshold r   the  prediction  of  the  switching  line 


produced  by  Adaline,  after  training,  is  shown  in  Figure  2?» 
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Figure  27 
From  previous  arguments  it  can  be  seen  that,  for  squares  9 
and  1%   Adaline  Will  produce  a  negative  output  after  training 
and,  for  squares  22  and  28  Adaline  will  produce  a  positive 
output.   Since  the  actual  switching  line  bisects  squares  3 

and  3*4-  the  output  Adaline  will  produce,  after  training,  when 
presented  with  the  patterns  of  squares  3  and  3^  is  uncertain, 
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Examples  of  the  switching  lines  actually  produced  by 
Adallne,  after  training,  are  shown  in  Figure  28 . 
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Figure  28 
Variations  of  these  switching  lines  were  produced  by  using 
different  values  of  the  adjustment  constant,  Cj  9  and  with 
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different  numbers  of  training  cycles .   The  four  points 
mentioned  earlier  in  the  section  are  now  discussedo 

The  size  of  adjustment  can  be  considered  as  affecting 
the  position  of  the  switching  line  in  the  following  way. 
For  values  of  Q  close  to  or  less  than  the  optimum  value 
(as  discussed  in  section  lo3)  the  switching  line  produced, 
after  training  on  a  large  number  of  points,  is  very  close 
to  the  predicted  line  Variations  occurred  when  the  values 
of.  Q    were  greater  than  the  optimum  value  „   When  the 
adjustment  is  large,  the  last  few  patterns  presented  before 
training  is  stopped  will  have  a  disproportionately  large 
effect  on  the  weights .   The  last  few  patterns  may  not  be 
distributed  evenly  among  the  squares  and  hence  Adaline  may 

be  biased  and  produce  a  peculiar  switching  line*  Two  examples 
are  shown  in  Figure  29.   When  the  size  of  adjustment  is 
small  this  effect  is  less. 
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The  effect  on  the  switching  line  of  the  number  of  points 

presented  during  training  is  similar  to  that  due  to  different 

r 

values  of  constant,  CK    ,  but  is  not  so  pronounced „  The 
squares  which  are  affected  most  are  those  for  which,  during 
training,  there  are  both  positive  and  negative  values  ofd(X). 
The  total  number  of  points  presented  determines  the  number 
which  occur  in  a  given  square.   Although  the  random  number 
generator  produces  a  statistically  random  spread  of  points, 
it  may  be  cut  off  at  a  stage  when  there  has  been  a  distinct 
bias  toward  one  region  of  the  phase  plane •   The  number  of 
points  presented  during  testing  varied  between  100  and  1000. 
In  the  case  of  a  grid  of  16  squares  the  switching  line  is 
unaffected*   In  the  grid  of  25  squares,  square  13  of  Figure  26 
yields  outputs  both  negative  and  positive—depending  on  the 
number  of  training  points.   In  the  case  of  a  grid  of  36 
squares  the  outputs  produced  by  squares  3  and  3^  oscillate 
between  positive  and  negative  values. 

When  examined  during  training  the  weights  and  threshold 
are  seen  to  oscillate  about  a  mean  value.   Given  a  value  of 
less  than  or  equal  to  the  optimum  value,  the  weights  and 
threshold  oscillate  around  values  which  yield  a  switching 
line  close  to  the  predicted  switching  line0  The  oscillation 
is  a  result  of  the  presentation  of  conflicting  information 
from  squares  cut  by  the  switching  line  of  the  plant. 

Widrow  L^J   has  stated  that  "the  statistical  capacity 
of  Adaline  Is  twice  the  number  of  weights. n     By  this  "rule* 
the  Adaline  with  eight  weights  would  be  able  to  classify  the 


16  patterns  which  it  receives  but  the  10-  and  12-welght 
Adalines  would  have  difficulty  In  attempting  to  classify  the 
25  and  36  patterns  which  they  receive ,   The  results  obtained 
Indicate  that  this  does  not  appear  to  be  the  casee   In  an 
attempt  to  discover  if  there  is  a  limit  to  the  number  of 
patterns  which  Adaline  can  separate,  Adalines  with  14,  16, 
18,  20,  22  and  24  weights  were  tralnedo   Several  thousand 
points  were  presented  to  each  Adaline  during  training  and, 
in  all  cases,  the  switching  line  produced  after  training 
was  very  close  to  that  predicted-  Variations  occurred  only 
when  squares  were  cut  by  the  switching  line  of  the  control 
system.   Since  an  upper  limit  did  not  appear  (an  Adaline  with 
24  weights  separated  144  patterns  into  two  classes)  further 
study  of  Adaline* s  capacity  is  indicated . 
3»3   Adaline  Acting  as  the  Controller 

A  digital  computer  simulated  Adaline  was  now  placed  in 
control  of  the  simulated  plant  described  in  section  3,1. 
The  desired  output,  \~\*C) ,  was  set  to  zero,  and  'inputs'  were 
applied  by  setting  Initial  conditions,  ^C^)  and  C(0j,  at 
the  output  shaft o   Trajectories  in  the  phase  plane  were 
obtained  for  various  initial  conditions. 

A  block  diagram  showing  Adaline  connected  to  the  plant 

1. 

is  shown  in  Figure  30°  The  weights  and  threshold  chosen 
for  these  tests  were  those  which  gave  switching  lines  as 
close  as  possible  to  the  predicted  lines  of  Figures  25» 
26,  27. 
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The  essence  of  computer  calculations  is  as  follows , 
The  initial  conditions  C(oJ  ,  c(o J  are   presented  to  the 
error,  error-rate  calculator  and  yield  £(o)  ,  <^(0)  .  The 
values  G(0  J  ,  G^O)  are  presented  to  the  coder  and  time,  L  ,  is 
set  to  zero.   A  pattern,  which  depends  on  the  square  in  which 
the  point  with  coordinates,  ^-(.M  ,  ^t)     ,  lies,  is  generated 
and  applied  to  Adaline.   Adaline  produces  an  output,  Q  ,  and 
this  causes  the  output  from  the  relay  to  assume  a  definite 
sign.   Using  equations  3«3  and  3°^»  and  a  small  increment  of 
time  (10ms),  C-(tj  and  C/tlare  calculated,,   These  values  define 
the  second  point  of  the  phase  trajectory.   £-0-)  and  C  (jt ) 
are  now  applied  to  the  error,  error-rate  calculator  and  the 
calculations  are  repeated,,  The  values  of  ^(jtjand  G.  (X  J 
are  stored  for  printing  and  for  drawing  a  graph  of  C-ft") against 
at)  — the  phase  trajectory.   The  input  to  the  relay,  0\,[t)   , 
is  examined  after  each  increment  of  time.   If  it  has  changed 
sign,  the  trajectory  has  crossed  the  switching  line  and  the 
sign  of  x\     in  equations  3«3  and  3°^  is  then  reversed,, 
Time  is  set  to  zero  and  the  initial  conditions,^0/  ,  £(Pj  , 
assume  the  values  of  £(,t]and  C(t)  that  existed  immediately 
before  switching  occurred.   Since  the  increments  of  time  are 
small,  these  values,,  of  ^Ctjand  £(>)are  close  to  the  values 
at  the  time  of  switching.   The  calculations  described  are 
continued  until  it  is  clear  that  the  trajectory  is  either 
converging  to  zero  error  or  to  a  steady  state  value  of  error. 

In  one  test,  an  Adaline  with  eight  weights  and  a  threshold 

was  used  to  control  the  plant.  The  values  of  the  weights  and 
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threshold  were  those  which  yielded  the  switching  line  of 
Figure  28(a)  and  are  given  in  Figure  31 .   This  Is  the  switching 
line  of  a  relay  operated  plant,  with  no  velocity  feedback,, 
The  system  in  Figure  30  should  exhibit  the  same  response. 
The  trajectory  in  response  to  initial  conditions,  C-(o)  -  "O  » 

C(o)  =  -2  is  shown  in  Figure  31 »  The  response  is  highly 
oscillatory,  but  stable,  and  the  error  is,  eventually,  reduced 
to  zero.   A  similar  response  would  be  obtained  for  other 
initial  conditions. 

In  considering  an  Adaline  with  ten  weights  and  a  threshold, 
corresponding  to  the  switching  line  of  Figure  28(b),  it  can 
be  seen  that  the  two  vertical  portions  of  the  switching 
line  will  result  in  the  relay  switching  early  in  the  course  of 
a  trajectory,  so  that,  after  switching,  the  trajectory  will 
tend  towards  the  origin*,  The  horizontal  portion  of  the 
switching  line,  however,  causes  trouble c   Consider  a  trajectory 
which  crosses  the  switching  line.   If  it  is  trajectory  B  of 
Figure  20,  then  the  relay  will  change  sign  and  the  trajectory 
becomes  trajectory  A  of  Figure  20.  Trajectory  A  then  crosses 
the  switching  line  and  switching  again  occurs 0  This  chattering 
will  continue  until  the  trajectory  crosses  the  vertical  line. 
This  is  best  illustrated  by  Figure  32  for  which  the  initial 
conditions  are  C(0]  =  8     »  ^C0)  =  2.   «   The  final  solution 
is  damped  but  settles  to  a  steady  state  error  of  2„  Tests 
were  then  run  with  many  values  of  initial  conditions  (see  also 
Figure  33)  kut  the  main  features  of  the  trajectories  were 
always  as  described.   It  should  be  noted  that  all  trajectories, 
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except  one,  will  result  in  chattering  and  a  final  steady- 
state  error  of  2.  The  exception  is  illustrated  in  Figure  3^  where 
the  Initial  conditions  are  c(°)  =  ""8  ,  £(&)  =  "~2)  °   It  can  be 
seen  that  switching  occurs  early,  and,  after  switching,  the 
trajectory  tends  towards  the  orlgin0   The  trajectory  continues 
past  the  ^{.M  axis,  meets  the  horizontal  switching  line, 
chatters  and,  finally,  settles  to  a  steady  state  error  of  20 
There  will,  however,  be  a  trajectory,  close  to  that  described, 
which  would  arrive  at  the  origin  after  the  relay  has  switched 
oncec 

Tests  were  also  made  using  an  Adaline  with  12  weights 
and  a  threshold,  corresponding  to  the  switching  line  of 
Figure  28(c).   As  in  the  example  with  10  weights  and  a  thresh- 
old, the  two  vertical  portions  of  the  switching  line  will 
cause  early  switching  and  the  two  horizontal  portions  are  so 
placed  that  the  relay  will  chatter  until  ^(>)  and  £(w  reach 
values  such  that  the  trajectory  crosses  the  central,  vertical 
lineu  The  trajectory  will,  in  all  cases,  settle  to  a  steady 
value  of  zero  error  and  error  rateQ   This  is  illustrated  in 
Figure  35--f or  which  the  initial  conditions  are  ^-(0) =  O    , 
£  (0 )  =  2.  •   It  can  De  seen  that  certain  values  of  initial 
conditions  would  result  in  a  trajectory  switching  only  on  the 
central  portion  of  the  switching  line*  This  is  illustrated  in 
Figure  36— for  which  the  initial  conditions  are  &(o)  =  2.  » 

cto)  =-5. 
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Finer  division  of  the  phase  plane  would  result  in 
more  accurate  reproduction  of  the  desired  switching  line  of  the 
controlled  plant,  but  Adaline*s  attempt  would  still  consist 
of  vertical  and  horizontal  portions •   The  examples  given 
show  the  type  of  response  which  can  be  expected. 
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CHAPTER  IV 
CONCLUSIONS 

The.  work  described  in  Chapter  III  has  shown  that  it 
is  possible  for  an  Adaline  to  learn  to  control  a  simple 
plant.   Its  ability  as  a  controller  is  poor,  however,  and 
in  this  section  methods  of  improving  the  basic  scheme  are 
considered*  The  concept  of  training  a  netwprk  to  perform 
a  desired  function  is  also  discussed. 
4,1  Adaline  and  a  Dead  Zone 

In  all  cases  discussed  in  section  3»3  the  combination 
of  Adaline  and  the  plant  resulted  in  a  stable  control  system, 
but  Adaline 's  control  ability  was  poor  in  regions  near  the 
origin  of  the  phase  plane — with  either  lightly  damped 
oscillations  about  the  origin  or  lightly  damped  oscillations 
about  a  steady  state  error.   One  method  of  eliminating  this 
would  be  to  use  Adaline  in  conjunction  with  a  relay  with  a 
small  dead  zone.   When  the  trajectory  enters  the  central 
square,  or  squares,  Adaline  would  be  disconnected  and,  with 
the  relay  in  the  dead  zone,  the  output  shaft  of  the  plant 
would  coast.  This  method  would  probably  yield  a  better 
response  than  can  be  obtained  using  Adaline  with  an 
ideal  relay.   At  worst  a  limit  cycle  condition  could  persist 
but,  if  the  central  square  were  small,  this  could  be  tolerated 
In  the  physical  realization  of  this  scheme,  a  source  of 
difficulty  would  lie  in  deciding  when  the  trajectory  leaves 
and  enters  the  central  square  so  that  Adaline  could  be 
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connected  or  disconnected. 

4.2  Coarse  and  Fine  Division  of  the  Phase  Plane 

Another  method  of  improving  Adallne's  response  near 
the  origin  could  consist  of  dividing  the  central  square, 
or  squares,  into  the  same  number  of  squares  as  the  major 
grid.  The  pattern  codes  of  squares  in  the  -large  grid  could 
then  also  be  assigned  to  corresponding  squares  in  a  fine 
grid.  This  is  illustrated  in  Figure  3?  for  a  division  of  the 
phase  plane  into  25  squares o   If  the  weights  and  threshold 
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correspond  to  the  switching  line  of  Figure  28(b),  by 

referring  to  section  3«3»  it  can  be  seen  that  most 

trajectories  will  now  finish  with  a  steady  state  error  of 

2/5,  instead  of  2,  since  the  grid  is  now  five  times  finer 

near  the  origin.   This  scheme  can  be  realized  by  altering 
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the  coding  box  in  Figure  30o   When  the  trajectory  enters 
the  central  square,  the  values  of  C(t)  and  &(t)   are 
amplified  by  5  before  being  applied  to  the  coder.  This  is 
indicated  in  Figure  38.   As  in  the  scheme  of  section  *Kl, 
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difficulties  Would  arise  in  deciding  when  the  trajectory 
enters  and  leaves  the  central  square ,   It  should  also  be 
noted  that  this  method  is  applicable  only  when  Adaline  is 
trained  on  a  linear  switching  line, 
4.3  Polar  Coding  Scheme 

In  Chapter  III  the  controlled  plant  had  a  switching 
line  which  Adaline  found  difficult  to  reproduce  due  to  the 
coarseness  of  the  division  of  the  phase  plane ,   One  possible 
approach  to  the  division  problem,  in  the  case  of  this  specific 
plant,  would  be  to  divide  the  phase  plane  into  regions  bounded 
by  concentric  circles,  with  their  centre  at  the  origin  and 
their  radii  extending  from  the  origin.  The  resulting  'curved' 
rectangles  are  those  to  which  patterns  can  be  assigned 
using  a  coding  scheme  such  as  that  discussed  in  section  3°2C 
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This  is  best  illustrated  by  considering  the  example  with 
six  angular  divisions  of  TyC,  radians  and  six  radial  divisions 
in  Figure  39. 

If  a  point  with  coordinates,  fi/tj  »  £-l"t)  9  in  the  phase 
plane  is  to  be  coded  and  assigned  a  pattern,  it  is  first 
necessary  to  convert  the  coordinates  of  the  point  into 
polar  coordinates,  |f  ,  CD   „   This  complication  is  trivial 
as  far  as  a  computer  simulation  is  concerned,  but  if 
hardware  implementation  is  to  be  considered,  the  additional 
complexity  would  make  this  method  of  dividing  the  phase 
plane  less  attractive  than  rectilinear  division0 

Using  a  digital  computer  simulation,  and  Adaline  with 
12  weights  and  a  threshold  was  trained  on  the  controlled 
plant  of  section  3°1°   The  program  was  very  similar  to  that 
described  in  section  3«2o   A  pseudo-random  number  generator 
produced  the  coordinates,  €  It)  ,C(t*J,  of  a  point  in  the 
phase  plane.   They  were  then  converted  to  polar  coordinates, 
and  presented  to  the  coder0  The  resulting  pattern  and  d(tj, 
as  calculated  from  equation  3°1»  were  presented  to  Adaline 
and  adjustments  were  then  made  to  the  weights .  The  calculation 
was  then  repeated  for  different  points o   After  a  few  training 
cycles  the  weights  were  fixed  and  the  response  of  Adaline, 
when  presented  with  a  pattern  from  each  square,  examined a 
The  switching  line  produced,  after  training  with  700  points 
and  an  adjustment  constant  of  0o03,  is  shown  in  Figure  40 «, 
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As  has  been  discussed  in  section  3°2,  the  distribution  of 

points  affects  the  training — -particularly  the  response  from 

patterns  corresponding  to  squares  or  areas  cut  by  the  switching 

line  of  the  plant.   The  switching  line  produced,  after 

training  with  300  points  and  an  adjustment  constant  of  0o03, 

again  illustrates  this  point  (Figure  kl) „ 
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It  can  be  seen  that  an  Adaline  adequately  trained  in  this 
manner  would  be  able  to  control  the  plant*   The  results  of  a 
computer  program,  using  the  weights  and  threshold  which 
correspond  to  the  switching  line  of  Figure  40  verified  this. 
The  phase  trajectory  was  identical  to  that  of  a  second  order 
plant,  compensated  with  unity  and  velocity  feedback,  and  with 
the  tachometer  constant  determined  by  the  slope  of  the 
switching  line  in  Figure  40« 
4.4  General  Remarks 

The  main  purpose  of  this  study  has  been  to  verify  that 
a  pattern  recognition  device,  if  suitably  trained,  can  take 
the  place  of  the  controller  of  a  plant*.   Some  of  the  possibilities 

of  control  using  Adaline,  and  some  of  the  associated  problems 

1 1 

have  been  revealed  in  the  course  of  this  work. 

It  is  instructive  to  consider  the  peculiar  nature  of 
this  particular  recognition  problem  and  to  this  end  comparison 
is  made  with  a  simple  form  of  the  weather  forecasting  scheme 
discussed  by  Hu  j  icT] „      Weather  maps,  containing  information 
on  barometric  pressure  over  a  wide  area  are  the  source  of  the 
patterns  to  be  presented  to  Adaline 0  The  weather  (wet  or  dry) 
on  the  following  day  at  location  B  is  the  "desired  output an 
The  map  is  divided  into  squares  and  the  squares  generate  +1 
or  -1  pattern  elements  according  as  the  pressure  is  higher 
or  lower  than  normal,,   The  pattern,  consisting  of  the  array 
of  pattern  elements  and  the  resulting  weather  at  B  are  presented 
to  Adaline,,   I,f  wet  is  assigned  to  -10  and  dry  to  +10,  Adaline 
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is  trained  to  produce  the  appropriate  output.  Training  consists 
of  presenting  Adaline  with  many  weather  maps,  and  the  resulting 
conditions  at  B.   After  training  is  complete,  Adaline  can, 
given  a  weather  map,  estimate  the  weather  on  the  following 
daye   In  this  recognition  problem,  given  a  large  amount  of 
information  (over  the  area  of  the  map),  the  response  at  a 
particular  place  can  be  evaluated.   In  the  control  system, 
the  recognition  problem  is  the  following:  given  information 
on  the  error  and  error  rate  of  the  plant  at  an  instant 
(the  coordinates  of  a  point  in  the  phase  plane)  what  is  the 
input  to  the  plant  to  force  the  error  and  the  error  rate  to 

tend  to  zero?   It  is  not  possible  in  this  case  to  examine  the 
overall  situation  to  assist  in  making  the  correct  decision.. 
To  date  this  problem  has  been  approached  by  assigning  patterns 
to  regions  of  the  phase  plane,  deciding  into  which  region  a 
point  falls,  and  making  a  decision  on  receipt  of  this  pattern.. 
It  can  be  seen  that  a  very  small  amount  of  information  is 
presented  to  Adaline  at  each  stage.   A  possible  area  of  study 
would  be  .to  search  for  a  better  way  of  presenting  the  information 
about  the  point,  and  possibly  about  the  desired  final  point, 
to  Adaline  at  each  stage*  For  the  control  problem  studied 
the  specific  difficulties  encountered  are  now  discussedo 

As  has  been  mentioned,  Adaline  found  it  difficult  to 
reproduce  the  linear  switching  line  of  the  controlled  plant 
because  of  the  coarseness  of  division  of  the  phase  plane 0 
It  would,  however,  require  a  very  fine  division  for  Adaline 
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to  be  able  to  reproduce  a  line  accurately.   The  resulting 
size  of  the  Adallne  would  make  practical  realization,  using 
memistors  L  M J  as  the  adaptive  elements,  prohibitively  expensive 
even  where  only  two  states  are  coded.  The  polar  coding  scheme 
offers  a  partial  solution  to  the  problem  of  reproducing  this 
particular  switching  line«  There  may  also  be  cases  in  which 
the  controlled  plant  has  an  unknown  and  complex  switching 
line,  which  would  be  better  tackled  by  a  polar  division 
than  by  rectilinear  division.  The  problem  of  the  coarseness 
of  the  division  and  the  resulting  undesirable  features  of 
the  trajectory  poses  a  difficult  problem  for  which  a  solution 
is  not  Immediately  obvious . 

Directly  associated  with  the  problem  of  the  coarseness 
of  the  divisions  is  the  rapid  increase  of  patterns,  and  the 
number  of  digits  per  pattern,  with  increasing  fineness  of 
the  quantization,,   It  was  shown,  experimentally,  that  lkk 
patterns  generated  in  the  quantized  phase  plane,  could  be 
successfully  separated  using  24  weights  and  a  threshold „ 
The  statement  on  the  capacity  of  Adaline  by  Widrow  C3j   and 
discussed  in  section  3«2  deals,  presumably,  with  all  possible 
patterns  which  can  be  generated  by  permutations  of  the  digits 
applied  to  Adallne c   In  presenting  Adaline  with  Ikk   patterns 
from  the  phase  plane,  only  a  small  and  carefully  controlled 
sample  of  the  total  number  of  patterns  which  can  be  obtained 
from  the  permutations  of  the  2k   digits  is  usedo 

In  this  study  Adaline  was  trained  on  a  plant  which  was 
known  to  perform  well  with  a  conventional  controller c   It  did, 
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in  fact,  prove  possible  to  train  Adallne  to  act  as  a  (rather 
crude)  controller,  but  if  this  is  to  remain  more  than  an  academic 
exercise,  the  question  must  be  asked  if  this  idea  can  be  applied 
to  more  complicated  situations.   If  Adallne  is  to  be  considered 
seriously  as  a  controller  in  some  given  situation,  it  must 
be  able  to  control  competently,  and  it  must  show  a  definite 
advantage  over  conventional  controllers.  There  is  one  such 
area  where  Adallne  might  be  used —  where  the  aim  is  to  duplicate 
the  response  of  a  human  operator,  or,  more  generally,  where 
the  controller  must  learn  to  control  in  the  face  of  an  absence 
of  information  about  the  dynamics  of  the  plant  and  its 
existing  controller,,   In  the  case  of  the  human  operator  the 
main  problem  would  be  to  present  Adallne  with  the  correct 
signals.  The  operator  can  be  considered  as  having  a  switching 
surface  (or  hypersurf ace )  which  depends  on  his  past  experience 
in  acting  on  obsevations  of  many  variables .   It  may  prove 
possible  to  train  Adallne  to  duplicate  his  performance  so  that 
Adallne  could  eventually  take  over  the  operator's  task.   A  by- 
product of  this  process  may  be  the  identification  of  some  of 
the  characteristics  of  the  controller  by  examining  Adallne* s 
weights  at  the  end  of  the  training  cycle D 

The  problem  of  training  Adallne  to  act  as  the  controller 
of  an  unknown  and,  as  yet,  uncontrolled  plant  is  more  difficult. 
A  set  of  Weights  could  be  placed  in  Adaline,  and  the  phase 
space,  with  certain  of  the  plant  variables  as  Its  coordinate 
axes,  could  be  quantized  and  coded.   On  starting  the  plant 
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from  certain  initial  conditions,  Adaline  would  produce  a 
response  and  drive  the  planto   As  the  resulting  trajectory 
passes  through  different  coded  regions,  different  responses 
would  be  produced  which  would  force  the  plant  to  behave  in 
a  certain  way.   If  the  resulting  trajectory  is  undesirable, 
Adaline  must  be  adjusted  in  some  way  to  produce  a  satisfactory 
trajectory,   The  main  problem  would  be  to  decide  how  to  devj.se 
a  suitable  training  procedure o 

Although  the  possible  uses  for  Adaline  seem  plausible, 
they  depend  on  solving  the  problem  of  Adaline 's  poor  response 
in  certain  areas  of  the  phase  plane .   The  schemes  mentioned  in 
sections  *K1  and  4.2  for  improving  Adaline *s  response  near 
the  origin  do  not  attack  the  problem  at  its  source,  and  an 
attempt  should  be  made  to  improve  the  performance  of  Adaline 
itself.   To  date  the  information  on  the  state  of  the  plant 
has  been  presented  to  Adaline  as  a  pattern  after  some  form 
of  coding.   An  area  of  research  would  be  to  consider  more 
sophisticated  ways  of  presenting  Adaline  with  information 
about  the  state  of  the  plant  and,  possibly ,  presenting 
additional  information  about  the  desired  final  state0 
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